Back

Frontiers in Genetics

Frontiers Media SA

Preprints posted in the last 7 days, ranked by how well they match Frontiers in Genetics's content profile, based on 197 papers previously published here. The average preprint has a 0.33% match score for this journal, so anything above that is already an above-average fit.

1
Daily feeding rhythms may play a role in the genetic variability of feed efficiency in growing pigs

Gilbert, H.; Foury, A.; Agboola, L.; Devailly, G.; Gondret, F.; Moisan, M.-P.

2026-04-21 zoology 10.64898/2026.04.17.719142 medRxiv
Top 0.6%
7.0%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWImproving feed efficiency in pigs is essential for reducing production costs and environmental impacts. This study examines the influence of circadian feeding rhythms and genetic polymorphisms on feed efficiency variability using two pig lines divergently selected for Residual Feed Intake (RFI) over ten generations. Feeding behavior was monitored using automatic concentrate dispensers, recording 6,494,097 visits from 3,824 pigs to analyze meal frequency, duration, and diurnal patterns. LRFI pigs ate less frequently, with larger meals and longer durations, they exhibited two distinct feeding peaks: one around 8:00 AM and a higher one at 5:00 PM and they consumed more feed during the diurnal period and less at night. HRFI pigs showed a smoother, less rhythmic feeding behavior with increased nocturnal intake. The differences between the two RFI lines became more pronounced as the number of generations of selection increased, suggesting a genetic basis. Feeding behaviors, including intake during the two main diurnal peaks, were found to be heritable (heritability estimates: 0.30-0.40) and genetic correlations were observed between feed intake and RFI, especially for intake between the two peaks. Then, we investigated the evolution of allele frequencies of single nucleotide polymorphisms (SNPs) in DNA sequences surrounding 10 core clock genes (ARNTL, CLOCK, CRY1, CRY2, NPAS2, NR1D1, PER1, PER2, PER3, RORA) along generations of selection. SNPs with significant frequency changes were mapped to regulatory regions and transposable elements, especially in HRFI line, suggesting potential functional impacts on circadian regulation. These results underscore the role of feeding behavior and genetic variation in feed efficiency, offering insights for breeding programs aimed at improving metabolic efficiency and sustainability in pig production.

2
Proteomic Insights into Lp(a) Cardiovascular Mechanisms: A Mendelian Randomization Study

Tomasi, J.; Xu, H.; Zhang, L.; Carey, C. E.; Schoenberger, M.; Yates, D. P.; Casas, J.

2026-04-22 genetic and genomic medicine 10.64898/2026.04.20.26351299 medRxiv
Top 1%
4.4%
Show abstract

Background: Elevated lipoprotein(a) [Lp(a)] is a known risk factor for several cardiovascular-related diseases established from multiple genetic and observational studies. However, the underlying mechanisms mediating the effects of Lp(a) levels on cardiovascular disease risk and major adverse cardiovascular events (MACE) are unclear. The aim of this study was to identify proteins downstream of Lp(a) using mendelian randomization (MR) - a genetic causal inference approach. Methods: A two-sample MR was performed by initially identifying Lp(a) genetic instruments based on data from genome wide association studies (GWAS) of Lp(a) blood concentrations. These instruments were then tested for association with proteins from proteomic pQTL data (Olink from UK Biobank, 2940 proteins and SomaScan from deCODE, 4907 proteins). Results: A total of 521 proteins associated with Lp(a) were identified. Using pathway enrichment analysis, the following MACE-relevant pathways were identified comprising a total of 91 Lp(a) downstream proteins: oxidized phospholipid-related, chemotaxis of immune cells and endothelial cell activation, pro-inflammatory monocyte activation, neutrophil activity, coagulation, and lipid metabolism. Conclusion: The results suggest that the influence of Lp(a) treatments is primarily through modifying inflammation rather than lipid-lowering, thus providing insight into the mechanistic framework which mediates the effects of elevated Lp(a) on atherosclerotic cardiovascular disease.

3
Network-Based Functional Fragility Reveals System-Level Reorganization Of The Gut Microbiome In Inflammatory Bowel Disease

Kenavdekar, M. V.; Natarajan, E.

2026-04-21 bioinformatics 10.64898/2026.04.16.719113 medRxiv
Top 1%
4.2%
Show abstract

The human gut microbiome plays a critical role in host health, yet its functional organization in disease remains poorly understood. Most studies focus on taxonomic composition or pathway abundance, which fail to capture higher-order interactions governing system-level behavior. Here, we investigated microbiome functional organization in inflammatory bowel disease (IBD), including Crohns disease (CD), ulcerative colitis (UC), and healthy controls (HC), using a network-based framework across 60 metagenomic samples. Functional pathway profiles were used to construct correlation-based interaction networks, followed by analysis of network topology, functional redundancy, keystone pathway architecture, and system robustness. Disease-associated networks (CD and UC) exhibited reduced global connectivity, increased modular fragmentation, and centralization of keystone pathways, indicating a shift from distributed organization to more fragmented and fragile network structures compared to healthy controls. Notably, machine learning models demonstrated that network-derived features achieved higher classification performance (accuracy up to 0.824) compared to redundancy-based measures. These findings reveal that microbiome dysfunction in IBD is driven by large-scale reorganization of functional interaction networks rather than loss of functional capacity. This study highlights the importance of network-level analysis in understanding microbiome-associated disease and provides a systems-level framework for future research.

4
A variance QTL approach to uncover gene-fish oil supplement interaction loci for 14 circulating unsaturated fatty acid traits

Ihejirika, S. A.; Stephen, E.; Ye, K.

2026-04-20 genetic and genomic medicine 10.64898/2026.04.13.26350791 medRxiv
Top 2%
3.3%
Show abstract

Gene-environment interactions (GEI) contribute to circulating polyunsaturated fatty acid (PUFA) and monounsaturated fatty acid (MUFA) profiles. GEI may partly explain differences in trait variance across genotype groups. To identify GEI for circulating unsaturated fatty acids, we adopted a two-stage strategy. First, we detected quantitative trait loci associated with trait variance (vQTLs). Second, we tested these vQTLs for interaction with fish oil supplements (FOS). We performed genome-wide vQTL screens for 14 plasma PUFA and MUFA phenotypes in a UK Biobank subset of 200,478 participants. At the genome-wide significance threshold (p < 5.0 x 10-8), we identified 172 vQTL-trait pairs across all 14 traits, and 16 of these vQTLs had no marginal genetic effect on the corresponding trait. We found 46 non-overlapping loci across all phenotypes, with an average of 12 vQTLs per trait. Omega-6% and PUFA% had the most independent vQTLs (N = 24) while DHA% and Omega-3% had the least (N = 1 and 2, respectively). For each of the 172 vQTL-trait pairs, we tested the interaction effect of the vQTL with FOS on the corresponding trait. We found six significant interaction signals in DHA, DHA%, Omega-3, Omega-3%, LA, and Omega-6/Omega-3 ratio around the FADS1/2, ZPR1, and SUGP1/TM6SF2 genes. Our results provide a comprehensive resource of vQTLs and gene-FOS interactions shaping the circulating levels of unsaturated fatty acids.

5
Single-Plant Genome-Wide Association Study Identifies Loci Controlling Multiple Vegetative Architecture Traits in Cultivated Northern Wild Rice (Zizania palustris L.)

McGilp, L.; Millas, R.; Mickelson, A.; Shannon, L. M.; Kimball, J.

2026-04-19 genomics 10.64898/2026.04.15.718548 medRxiv
Top 3%
3.0%
Show abstract

Cultivated Northern Wild Rice (Zizania palustris L.) is an obligately outcrossing, self-incompatible cereal grown in aquatic paddies in the United States. Genetic improvement has relied primarily on phenotypic recurrent selection, and genomic approaches remain largely unexplored in this emerging crop. We applied a single-plant genome-wide association study (sp-GWAS) framework to dissect vegetative architecture traits in five open-pollinated cultivated populations evaluated across three years (n = 2,173 plants). Plant height (PH), basal stem width (BSW), primary stem width (PSW), flag leaf length (FLL), and flag leaf width (FLW) were analyzed using a mixed linear model accounting for population structure and kinship. Broad-sense heritability ranged from 0.03 to 0.34, and year effects explained up to 54% of phenotypic variance, indicating strong environmental influence. After filtering 73,363 SNPs, genome-wide linkage disequilibrium decayed rapidly (r{superscript 2} = 0.1 at [~]2.3 kb). A total of 124 significant SNPs (FDR < 0.01) were consolidated into 98 loci, of which 46 were associated with multiple traits and 11 were shared across four traits. Candidate genes near multi-trait loci included conserved regulatory classes implicated in grass architecture, including HLH/bHLH transcription factors. Diplotype analyses at candidate loci revealed both simple biallelic and complex multi-allelic haplotype structures, indicating that locus-level haplotype effects underlie several GWAS signals. Results demonstrate that sp-GWAS can detect statistically robust associations in a highly heterozygous, non-replicable crop system and suggest a polygenic, coordinated genetic architecture governing vegetative growth. These findings support genomic prediction and multi-trait selection strategies to accelerate improvement of cultivated Northern Wild Rice. PLAIN LANGUAGE SUMMARYCultivated Northern Wild Rice is an important specialty crop grown in flooded paddies in the United States. Unlike many major crops, it is naturally outcrossing and highly variable, which makes traditional breeding challenging and slow. Most improvement efforts have relied on selecting plants based only on how they look in the field, and genomic tools have rarely been used. In this study, we used DNA markers to better understand the genetics behind plant structure traits such as plant height, stem thickness, and leaf width. We evaluated more than 2,000 plants from five cultivated populations over three growing seasons. Because weather and growing conditions strongly influence these traits, we used statistical models to separate environmental effects from genetic effects. We identified 98 regions of the genome associated with variation in plant structure. Many of these regions influenced more than one trait, showing that plant height, stem strength, and leaf size are genetically connected. Several regions contained genes similar to those known to control plant growth and development in other grasses. We also found that, in some cases, combinations of nearby DNA variants (haplotypes) explained trait differences better than single genetic markers. Overall, this work shows that modern genomic tools can successfully identify useful genetic variation in cultivated Northern Wild Rice, even though it is highly outcrossing and genetically diverse. These results provide a foundation for using genomic selection to improve plant structure, lodging resistance, and overall performance in breeding programs. CORE IDEASO_LISingle-plant GWAS successfully detects genetic associations in obligately outcrossing cultivated Northern Wild Rice where conventional replicated mapping populations are impractical. C_LIO_LIVegetative architecture traits exhibit low heritability but retain recoverable polygenic signal, where nearly half of detected loci influence multiple architecture traits, indicating integrated developmental control. C_LIO_LIGenome-wide linkage disequilibrium decays rapidly ([~]2.3 kb), consistent with expectations for an obligately outcrossing species and supporting relatively localized association signals. C_LIO_LICandidate genes include conserved regulatory classes (TE1-like, HLH/bHLH, SPL). C_LIO_LIGiven extensive overlap between QTL and environmental effect, multi-trait, multi-environment genomic prediction provides a pragmatic breeding strategy to improve canopy efficiency, lodging resistance, and harvestability in aquatic production systems. C_LI

6
Micro-Doppler Radar Identifies Movement Asymmetries After Anterior Cruciate Ligament Reconstruction

Onks, C. A.; Zeng, C.; Creath, R.; Simone, B. D.; Nyland, J. E.; Murphy, T. E.; Kishel, L. A.; Ardat, B. A.; Venezia, V. A.; Wiggins, A. M.; Shaffer, B. R.; Narayanan, R. M.

2026-04-21 sports medicine 10.64898/2026.04.15.26350397 medRxiv
Top 3%
2.7%
Show abstract

BackgroundPatients who have undergone Anterior Cruciate Ligament Reconstruction (ACLR) have a 6-24% chance of either re-tearing or having subsequent knee surgery. To date there have been no practical validated risk prediction models that can be easily implemented into clinical workflow for re-injury risk. Micro-Doppler radar (MDR) provides a promising solution. ObjectiveThe purpose of this study was to investigate the predictive ability of MDR to identify persons with a previous ACLR relative to an age and sex matched healthy control. MethodsACLR patients (n=81) and controls (n=100) performed drop box jump, sit to stand (STS), and walking trials as MDR signatures were collected. A 1D Convolutional Neural Network was developed to evaluate each activity individually followed by the development of a fusion model validation using all three activities. ResultsThe STS model individually achieved the highest overall accuracy of 82.3%, with a sensitivity of 71.6% and specificity of 91.0%. The fusion model using all activities achieved a peak overall accuracy to detect ACLR of 86.2%, 80.3% sensitivity, and 91% specificity. ConclusionsCurrently, there is no clinically validated, efficient approach to objectively evaluate human motion at the point of care. When coupled with machine learning, MDR accurately differentiates ACLR from control groups by identifying complex biomechanical asymmetries, with classification performance comparable to or exceeding that of motion capture. Future research is needed to determine if MDR can be used in conjunction with risk prediction modeling. Key pointsMicro-Doppler radar provides a promising new solution to identify important human motion asymmetries in clinical settings. Here we evaluated a group of patients who have a history of Anterior Cruciate Ligament reconstruction versus a control group. Simple movements performed in the presence of the micro-Doppler radar system were used to identify the 2 groups with accuracy comparable or superior to motion capture systems.

7
Genome-wide identification and characterization of the NAC transcription factor family in Cynodon dactylon and their expression during abiotic stresses

Poudel, A.; Wu, Y.

2026-04-20 bioinformatics 10.64898/2026.04.15.718725 medRxiv
Top 3%
2.4%
Show abstract

Common bermudagrass (Cynodon dactylon) is a highly resilient and cosmopolitan grass widely used for turf, forage, and soil stabilization. Although its genome has been sequenced, little study has focused on characterizing genes underlying its resilience, including the NAC transcription factor family, which is well known for its physiological and stress-related functions. This study aimed to systematically characterize NAC TF genes in the bermudagrass genome and assess their potential roles in abiotic stress tolerance. A total of 237 CdNAC genes were identified and phylogenetically classified into 14 groups, including 40 members in the NAM/NAC1 class, which is associated with plant growth and development, and 23 members in the SNAC class, which is associated with stress responses. Tissue-specific RNA-seq analysis indicated that about one-fourth of CdNAC genes were expressed across all tissues, whereas 13 genes showed relatively higher expression in roots and 9 in inflorescence, suggesting both essential and specialized functions. Stress-responsive expression profiling revealed that 35 CdNAC genes were upregulated in response to drought, 43 to heat, 10 to salt, and 42 to submergence stress. Notably, CdNAC122, 149, and 155, the members of SNAC class, were consistently upregulated across all stress conditions, while others exhibited stress-specific expression, such as CdNAC37, 130, 145, and 199 in drought, CdNAC7, 12, 18, and 29 in heat, CdNAC46 and 151 in salt, and CdNAC9 and 31 in submergence. In contrast, 53 genes were downregulated during different stresses, with most belonging to NAM/NAC1, TERN, or OsNAC7 classes, possibly reflecting suppression of photosynthesis and development-related processes under stress. These results provide the first comprehensive characterization of CdNAC genes, reveal their distinct regulatory roles in abiotic stress responses, and establish a foundation for future functional validation and applications in breeding of stress-resilient bermudagrass.

8
Ensemble Approaches to Screening, Diagnosis, and Subtyping of Multiple Sclerosis

Yang, I. Y.; Patil, A.; Jin, O.; Loud, S.; Buxhoeveden, S.; Zhang, D. Y.

2026-04-21 genetic and genomic medicine 10.64898/2026.04.19.26351230 medRxiv
Top 3%
2.1%
Show abstract

Multiple sclerosis (MS) is a debilitating disease affecting more than 1 million Americans, and today is assessed primarily through magnetic resonance imaging (MRI) and observational clinical symptoms. Given the autoimmune nature of MS, we hypothesized that high-dimensional gene expression data from peripheral blood mononuclear cells (PBMCs), when analyzed with the assistance of AI, may collectively serve as valuable biomarkers for the real-time risk and progression of MS. Here, we present PBMC RNA sequencing (RNAseq) results from N=997 samples, including 540 MS, 221 neuromyelitis optica (NMO), and 149 healthy controls. We constructed and optimized ensemble models for three clinical outcomes: (1) discrimination of early MS (EDSS [&le;] 2.0) from healthy individuals with 74% AUC at 100% coverage, (2) differential diagnosis of MS from NMO with 91% AUC at 80% coverage, and (3) subtyping RRMS from progressive MS with 79% AUC at 80% coverage. To our knowledge, no prior molecular test has been reported for any of these three MS clinical tasks, and these results may have immediate impact on clinical management of MS patients. Two innovations that improved the stratification accuracy of our models: selection of gene sets based on expression variance in disease states, and use of non-linear rank sort and conviction weighting in the ensemble score calculation.

9
A phylogenetic approach reveals evolutionary aspects and novel genes of bradyzoite conversion in Toxoplasma gondii

C A, A.; Upadhayay, R.; Patankar, S. A.

2026-04-21 bioinformatics 10.64898/2026.04.20.719551 medRxiv
Top 4%
1.9%
Show abstract

Toxoplasma gondii is a widespread human pathogen that has multiple, clinically relevant stages in its complex life cycle, including fast-replicating tachyzoites and latent bradyzoites. Bradyzoite differentiation is triggered by stress responses that lead to changes in transcription, translation, and metabolism. Two aspects of this process are addressed in this report: first, whether proteins that play roles in bradyzoite differentiation are specific to T. gondii and other bradyzoite-forming parasites of the Sarcocystidae family, and second, whether new bradyzoite differentiation proteins can be identified in T. gondii. To answer these questions, a phylogenetic approach was used, comparing proteomes of select members of the Sarcocystidae family that form morphologically different bradyzoite cysts and members of the Eimeriidae family that do not form cysts. This approach resulted in 8 distinct clusters of T. gondii proteins that reflected different conservation patterns; for example, one cluster showed conservation among all organisms, while another showed conservation in bradyzoite cyst-forming organisms. Known T. gondii proteins involved in bradyzoite differentiation were found in all clusters, indicating that this process uses both highly conserved pathways as well as bradyzoite-specific pathways. Importantly, the cluster containing proteins that are conserved in bradyzoite-forming organisms contained several known regulators of bradyzoites, and will be a source for identifying novel T. gondii proteins that are involved in bradyzoite differentiation.

10
A Seychelles warbler genomic toolkit

Lee, K. G. L.; Bartleet-Cross, C.; Gonzalez-Mollinedo, S.; Dong, S.; Pinto, A.; Lee, C. Z.; Sparks, A.; van de Velde, M.; Manarelli, M.-E.; Holden, T.; Tucker, R.; Maher, K. H.; Hipperson, H.; Slate, J.; Komdeur, J.; Richardson, D.; Dugdale, H.; Burke, T.

2026-04-21 genomics 10.64898/2026.04.16.719046 medRxiv
Top 4%
1.8%
Show abstract

Understanding evolutionary processes is greatly facilitated by high-quality data on genetic variation. We report the development of a genomic toolkit for a recently bottlenecked, long-term studied species, the Seychelles warbler (Ptimerl dezil; Acrocephalus sechellensis). This toolkit comprises a reference genome assembled into 31 chromosomes, together with functional annotations and reference-panel-free imputation of whole-genome sequences from 1,935 individuals. The genomic data have been used to assign the sequenced individuals into a genetic pedigree. Individual genomic data are associated with a suite of phenotypic metadata, amassed from three decades of fieldwork in this closed, long-term monitored population. We compared sex and parentage assigned using the genomic data with the previously recorded sex and parentage metadata to identify and correct 41 sample DNA samples labelled with the wrong identity. This population resource enables a wide range of analyses, that include, but are not limited to phylogenetics, metabarcoding, recombination rates, linkage patterns, adaptation, heritability, demographic history, selection, and inbreeding estimates. We wish to encourage interest from researchers seeking to collaborate on future analyses and data collection. Overall, our methods demonstrate the potential of next generation sequencing and statistical tools to provide dense genomic datasets at large sample sizes for wild populations.

11
Pan1c : a pipeline to easily build chromosome-level pangenome graphs

Mergez, A.; Racoupeau, M.; Bardou, P.; Linard, B.; Legeai, F.; Choulet, F.; Gaspin, C.; Klopp, C.

2026-04-21 bioinformatics 10.64898/2026.04.17.719212 medRxiv
Top 4%
1.8%
Show abstract

The advances of sequencing technologies and the availability of high-quality genome assemblies for many genotypes per species, give the opportunity to improve sequence alignment rate and quality, and the variant calling accuracy by including all genomic variations in a graph reference, called a pangenome graph. Because the process of building and analysing a pangenome graph is still complex, with related software packages under development, there is an important need for releasing user-friendly pipelines for this emerging research area. Pan1C is a pipeline based on a chromosome-by-chromosome graph construction strategy. It integrates two complementary strategies for building pangenomes and produces informative metric plots and graphics using a large set of tools. By benchmarking Pan1C on human, fungal, and wheat assemblies, which span a wide range of genome sizes and complexities, we showed the interest of Pan1C for assembly and graph validation as well as for performing primary analyses.

12
Diminished sex hormone levels influence the risk of skewed X chromosome inactivation

Roberts, A. L.; Osterdahl, M. F.; Sahoo, A.; Pickles, J.; Franklin-Cheung, C.; Wadge, S.; Mohamoud, N. A.; Morea, A.; Amar, A.; Morris, D. L.; Vyse, T. J.; Steves, C. J.; Small, K. S.

2026-04-22 genetic and genomic medicine 10.64898/2026.04.20.26351303 medRxiv
Top 4%
1.8%
Show abstract

BackgroundX chromosome inactivation (XCI) is the mechanism which randomly silences one X chromosome to equalise gene expression between 46, XX females and 46, XY males. Though XCI is expected to result in a random pattern of mosaicism across tissues, some females display a significantly unbalanced ratio in immune cells, termed XCI-skew, in which [&ge;]75% of cells have the same X inactivated. XCI-skew is associated with adverse health outcomes and its prevalence increases with age - particularly after midlife - yet the specific risk factors have yet to be identified. The menopausal transition, which is driven by profound shifts in sex hormone levels, has significant impact on chronic disease risk yet the molecular and cellular effects are incompletely understood. We hypothesised that the menopausal transition may impact XCI-skew. MethodsUsing XCI data measured in blood-derived DNA from 1,395 females from the TwinsUK population cohort, along with questionnaires, genetic data, and sex hormone measures, we carried out a cross-sectional study to assess the impact of the menopausal transition and sex hormones on XCI-skew. ResultsWe demonstrate that early menopause (<45yrs) is associated with increased risk of XCI-skew. In subset analyses across those who had a surgically induced or natural menopause, we find the association restricted to those who underwent a surgical menopause. We next identify a low polygenic score (PGS) for testosterone levels is significantly associated with XCI-skew, which we replicate in an independent dataset (n=149), while a PGS for age at natural menopause is not associated. Finally, using longitudinal measures across two time points spanning [~]18 years we show XCI-skew is a stable cellular phenotype that typically increases over time. DiscussionThese data represent the first environmental and genetic risk factors of XCI-skew, both of which implicate endogenous sex hormone levels, particularly testosterone. We propose XCI-skew may have clinical relevance in postmenopausal females.

13
High-variance phenome database reveals important roles of WD40 proteins in the plant pathogenic fungus Fusarium graminearum

Choi, S.; Lee, N.; Jeon, H.; Park, J.; Kim, S.; Kim, J.-E.; Shin, J.; Moon, H.; Min, K.; Choi, Y.; Hwangbo, A.; Kim, H.; Choi, G. J.; Lee, Y.-W.; Song, D.-G.; Son, H.

2026-04-20 molecular biology 10.64898/2026.04.19.719521 medRxiv
Top 5%
1.6%
Show abstract

O_LIWD40 is a highly conserved protein domain in eukaryotes, playing a critical role in various cellular process. C_LIO_LIWe conducted genome-wide functional analysis of WD40 genes in Fusarium graminearum--a phytopathogenic fungus that causes severe yield loss and mycotoxin contamination in major cereal crops. C_LIO_LIComprehensive phenome analysis of 119 WD40 gene deletion mutants across 22 distinct phenotypic traits revealed phenotypic divergence within the phenome, establishing a strong correlation between virulence and sexual reproduction. Notably, 21 "core WD40 genes" were identified, offering valuable insights into divergent biological processes. C_LIO_LIPilot interactome studies of Fgwd101 and Fgwd133 provided further insights into their potential pathobiological functions. Our investigation contributes to broadening our knowledge of the biological mechanisms underlying fungal pathogenesis and may assist in the identification of targets for antifungal agents. C_LI

14
In Silico study of clinical implication of markers associated with PTHrP regulatory mechanisms and linked to angiogenesis and EMT program of colorectal cancer

Carriere, P. M.; Novoa Diaz, M. B.; Birkenstok, C.; Gentili, C.

2026-04-20 cancer biology 10.64898/2026.04.15.718767 medRxiv
Top 5%
1.6%
Show abstract

Parathyroid hormone-related peptide (PTHrP), encoded by PTHLH, has been implicated in tumor progression through its involvement in epithelial-mesenchymal transition (EMT), angiogenesis, and tumor cell migration. Previous experimental studies suggest that PTHrP may promote these processes in colorectal cancer (CRC), partly through the modulation of factors such as secreted protein acidic and rich in cysteine (SPARC) and vascular endothelial growth factor (VEGFA). These events play a key role in the acquisition of an aggressive phenotype in our experimental models. In this study, we performed an integrative in silico analysis of multiple transcriptomic datasets to investigate the potential role of PTHLH in CRC. Differential expression analysis identified a set of consistently dysregulated genes across independent datasets. Functional enrichment and network analyses revealed that PTHLH expression is associated with biological processes related to extracellular matrix remodeling, EMT, and angiogenesis. Correlation analyses showed a positive association between PTHLH and SPARC expression, while network-based approaches suggested a potential functional connection with VEGFA. To assess the clinical relevance of these findings, survival analysis was performed using publicly available datasets. High expression levels of PTHLH, SPARC, and VEGFA were significantly associated with reduced overall survival in patients. Notably, a combined gene signature based on these three factors demonstrated a stronger prognostic effect than individual genes, indicating enhanced predictive value. These findings suggest that PTHrP is associated with molecular pathways involved in tumor progression and, together with SPARC and VEGF, may contribute to a coordinated regulatory axis with prognostic relevance in CRC, warranting further experimental validation.

15
The causes of signed linkage disequilibrium within genomic datasets

Stetsenko, R.; Merot, C.; Glemin, S.; Roze, D.

2026-04-21 genomics 10.64898/2026.04.17.719204 medRxiv
Top 6%
1.5%
Show abstract

Several recent studies have quantified signed linkage disequilibrium (LD) among mutations in genomic datasets, often reporting positive LD, particularly among mutations presumed to be less deleterious, such as synonymous variants. In this article, we investigate two potential sources of this positive LD: the focus on rare alleles, as adopted in several previous studies, and errors arising in the mapping of short-read sequences onto a reference genome. Using coalescent simulations, we extend previous theoretical results of the effect of focusing on rare alleles, and show that derived alleles present at similar frequencies tend to be in positive LD. Reanalyzing datasets from Capsella grandiflora and Drosophila melanogaster, we show that LD among synonymous derived alleles vanishes in the absence of any conditioning on frequency, while LD between mutations categorized as potentially deleterious by the SIFT4G program stays positive. However, we show that in both cases, this positive LD may be at least partly caused by the potential mismapping of a small fraction of sequences in some individuals, which could be a consequence of structural variants that are absent from the reference genome. Overall, these results show that average signed LD among mutations can be strongly affected by technical artifacts even if these concern only a minority of variants. Finally, we discuss other possible sources of positive LD among deleterious mutations.

16
GenePT Revisited: Do Better Text Embeddings Make Better Gene Embeddings?

Hedley, J. G.; Torr, P. H. S.; Märtens, K.

2026-04-20 genomics 10.64898/2026.04.16.718976 medRxiv
Top 6%
1.3%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWGenePT introduced a simple recipe for gene representations: embed each genes natural-language description with a general-purpose text embedding model and reuse the resulting vectors across downstream tasks. Since GenePTs release, embedding models have improved rapidly, with many strong open and commercial encoders benchmarked on suites such as the Massive Text Embedding Benchmark (MTEB). We present a controlled "leaderboard" study that keeps the GenePT pipeline fixed and varies only the embedding backbone. We benchmark contemporary encoders on four diverse gene embedding tasks: gene-gene interaction prediction, gene property classification, cell type classification, and prediction of transcriptomic responses to unseen genetic perturbations. Across these settings, newer backbones consistently outperform the original GenePT backbone (text-embedding-ada-002), achieving improvements of 1-17%, while enabling fully reproducible research by avoiding API dependencies.

17
Germline-mediated ubiquitous recombination in ScxCre male mice: implications for tendon research

Li, H.; Cao, C.

2026-04-21 genetics 10.64898/2026.04.16.719028 medRxiv
Top 7%
1.1%
Show abstract

Scleraxis (Scx), a basic helix-loop-helix (bHLH) transcription factor, is a primary marker for tendon and ligament lineages. Consequently, mouse models utilizing Cre recombinase under the control of the Scx locus represents a powerful tool for control of gene expression in tendon. The constitutive ScxCre mouse line is widely used for tendon-specific genetic manipulation. In this study, we demonstrate that ScxCre exhibits undesired significant off-target activity in the male germline, leading to ubiquitous recombination of floxed alleles in all tissues of the resulting offspring. This inheritance of recombined LoxP alleles occurs independently of Cre inheritance, indicating that ScxCre-induces recombination occurs prior to meiosis in diploid germ cells. This off-target activity is not observed in female germline. These findings highlight a critical need for stringent parental sex selection when using ScxCre lines to ensure tissue-specific targeting and avoid unintentional global gene deletion or transgene activation.

18
T lymphocyte regulatory cytokines predict frailty in older adults

Akie, T. E.; Loew, E.; Huang, Z.; Neff, H. A.; Michaels, O. P.; Haran, J. P.

2026-04-20 immunology 10.64898/2026.04.16.716397 medRxiv
Top 7%
1.1%
Show abstract

Frailty is a multi-system syndrome causing increased susceptibility to health insults in older adults. Immune system dysregulation and inflammaging have emerged as mechanisms that may affect multiple organ systems in the frailty syndrome. This present study seeks to define the immune state in community-dwelling adults suffering from frailty. We evaluated a subgroup of 169 individuals enrolled in the Gut-brain Alzheimers disease Inflammation and Neurocognitive Study (GAINS). Participants in the GAINS study were scored for frailty using the Clinical Frail Scale. A panel of 27 inflammatory cytokines was analyzed from the serum of each participant. Frailty was present in 33 (19.5%) of the cohort, and was correlated with age, malnutrition, and cognitive assessments. Statistical analysis adjusting for clinical covariates revealed higher serum levels of IL-2, IL-10, and IL-17 in frail patients. Using machine learning classification, we developed a predictive model of frailty with strong discriminative performance (AUC 0.78). Individual element analysis via Shapley Additive Explanations (SHAP) revealed that inflammatory markers had the greatest influence on the model, and IL-7 was the single most important element in the prediction of frailty. Together, our data demonstrate a novel pattern in which T-cell regulatory inflammatory molecules as mediators of frailty, implicating cellular immunity as a potential mechanism of dysfunctional aging.

19
Biobank-scale survey of gene-diet interactions informs precision nutrition polygenic scores

Di Scipio, M.; Man, A.; Lali, R.; Wu, J.; Le, A.; Franks, P. W.; Pare, G.

2026-04-20 genetic and genomic medicine 10.64898/2026.04.13.26350340 medRxiv
Top 7%
1.0%
Show abstract

Genome-guided dietary advice is a goal of precision nutrition. However, the contribution of gene-diet interactions (GxDs) to disease risk remains unclear, hindering the identification of diet-outcome pairs more likely amenable to genetic-based recommendations. We thus implemented a two-step approach: first, we comprehensively assessed the contributions of genome-wide GxDs to cardiometabolic outcomes across a broad array of dietary exposures in UK Biobank participants (N = 141,144 to 325,989). Second, we selected the 20 significant diet-outcome pairs from the 713 pairs tested (p < 7.0 x 10-5) and derived GxD polygenic scores. In an independent sample, all scores were nominally associated with their corresponding outcomes, with 12 of 20 polygenic scores Bonferroni significant (p < 0.0025). Further analyses revealed GxD polygenic scores were associated with clinical outcomes such as incident gout, suggesting translational potential. Altogether, these results showcase the promise of GxD scores to inform precision nutrition.

20
Mediation analysis in longitudinal data: an unbiased estimator for cumulative indirect effect

Li, Y.; Cabral, H.; Tripodis, Y.; Ma, J.; Levy, D.; Joehanes, R.; Liu, C.; Lee, J.

2026-04-20 epidemiology 10.64898/2026.04.18.26351189 medRxiv
Top 7%
1.0%
Show abstract

Mediation analysis quantifies how an exposure affects an outcome through an intermediate variable. We extend mediation analysis to capture the cumulative effects of longitudinal predictors on longitudinal outcomes. Our proposed model examines how mediators transmit the effects of the current and previous exposure on the current outcome. We construct a least-squared estimator for cumulative indirect effect (CIE) and used three approaches (exact form, delta method, and bootstrap procedure) to estimate its standard error (SE). The estimator of CIE is unbiased with no unmeasured confounding and independent model errors between mediator model and outcome model at all time points, as shown in statistical inference and in simulations. While three SE estimates are numerically similar, bootstrap procedure is recommended due to its simplicity in implementation. We apply this method to Framingham Heart Study offspring cohort to assess if DNA methylation mediates the association of alcohol consumption with systolic blood pressure over two time points. We identify two CpGs (cg05130679 and cg05465916) as mediators and construct a composite DNA methylation score from 11 CpGs, which mediates for 39% of the cumulative effect. In conclusion, we propose an unbiased estimator for CIE. Future studies will investigate the missingness in mediators and outcomes.